Improving Indonesian multietnics speaker recognition using pitch shifting data augmentation

نویسندگان

چکیده

Speaker recognition to recognize multiethnic speakers is an interesting research topic. Various studies involving many ethnicities require the right approach achieve optimal model performance. The deep learning has been used in speaker classes high accuracy results with promising results. However, multi-class and imbalanced datasets are still obstacles encountered various using method which cause overfitting decreased accuracy. Data augmentation overcoming problem of small amounts data multiclass problems. This can improve quality according applied. study proposes a pitch shifting neural network called (PSDA-DNN) identify Indonesian speakers. that done prove PSDA-DNN best multi-ethnic where reaches 99.27% precision, recall, F1 score 97.60%.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

pattern recognition in maintenance data using methodologies data minitng (cade study isfahan regional power electric company)

فعالیت های نگهداری و تعمیرات اطلاعاتی را تولید می کند که می تواند در تعیین زمان های بیکاری و ارایه یک برنامه زمان بندی شده یا تعیین هشدارهای خرابی به پرسنل نگهداری و تعمیرات کمک کند. وقتی که مقدار داده های تولید شده زیاد باشند، فهم بین متغیرها بسیار مشکل می شوند. این پایان نامه به کاربردی از داده کاوی برای کاوش پایگاه های داده چندبعدی در حوزه نگهداری و تعمیرات، برای پیدا کردن خرابی هایی که موجب...

15 صفحه اول

Improving Deep Learning using Generic Data Augmentation

Deep artificial neural networks require a large corpus of training data in order to effectively learn, where collection of such training data is often expensive and laborious. Data augmentation overcomes this issue by artificially inflating the training set with label preserving transformations. Recently there has been extensive use of generic data augmentation to improve Convolutional Neural N...

متن کامل

Pitch maxima for robust speaker recognition

This paper presents a novel approach to the design of a robust speaker recognition system. A noise-free synthesised spectrum is produced from a noisy spectrum. This synthesised spectrum is used for feature extraction. From noisy speech, the pitch is extracted using arobust pitch estimation algorithm. This also helps in identifying the voiced segments of speech which are the only ones considered...

متن کامل

Improving Children's Speech Recognition Through Out-of-Domain Data Augmentation

Children’s speech poses challenges to speech recognition due to strong age-dependent anatomical variations and a lack of large, publicly-available corpora. In this paper we explore data augmentation for children’s speech recognition using stochastic feature mapping (SFM) to transform out-of-domain adult data for both GMM-based and DNN-based acoustic models. We performed experiments on the Engli...

متن کامل

Improving the phase vocoder approach to pitch-shifting

A class of methods known as phase vocoders allows for implementing pitch shifting in the spectral domain. We extend the approach of shifting the isolated harmonics of the spectrum by introducing a new technique for separating the sinusoidal components. Keeping together the main lobe and the side lobes, which result from convolution of the harmonics with the spectrum of the analysis window in th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IAES International Journal of Artificial Intelligence

سال: 2023

ISSN: ['2089-4872', '2252-8938']

DOI: https://doi.org/10.11591/ijai.v12.i4.pp1901-1908